Skip to content

Fix deserialization of unions #332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Mar 15, 2021
Merged

Conversation

forest-benchling
Copy link
Collaborator

@forest-benchling forest-benchling commented Feb 9, 2021

Deserialization of unions in from_dict was not working properly, because it failed to account for cases where the union property is None or Unset. This PR adds explicit checks for None or Unset.

In addition, the type of the value to deserialize was previously set to Any, which was masking type errors. In order to fix these type errors, I introduced a new concept of check_type_for_construct for each property, which allows the type to be checked before attempting deserialization.

In the future, I think we should refine the try/except pattern; it's not good practice to rely on a catch-all except when trying to deserialize each type within the union, because we don't know what exception was actually raised.

As part of this PR, I also refactored the type_string method to reuse logic across properties. Also as part of this PR, I had include #335, which had a circular dependency on this PR:

Previously, if you had a oneOf of inlined models, each inline model was generated with the same name, causing effectively only the last model to be included (since it overwrote all the previous models).

In this PR, we give it an autogenerated name, to avoid the name clash.

@forest-benchling forest-benchling marked this pull request as ready for review February 9, 2021 23:08
@forest-benchling
Copy link
Collaborator Author

@dbanty I opened a new PR, since the git history and merge conflicts with the template renaming (adding jinja) was too hard to reconcile.

@codecov
Copy link

codecov bot commented Feb 9, 2021

Codecov Report

Merging #332 (0d90970) into main (2861a34) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #332   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           47        47           
  Lines         1410      1420   +10     
=========================================
+ Hits          1410      1420   +10     
Impacted Files Coverage Δ
...penapi_python_client/parser/properties/__init__.py 100.00% <100.00%> (ø)
...i_python_client/parser/properties/enum_property.py 100.00% <100.00%> (ø)
..._python_client/parser/properties/model_property.py 100.00% <100.00%> (ø)
...penapi_python_client/parser/properties/property.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2861a34...0d90970. Read the comment docs.

Copy link
Collaborator

@dbanty dbanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice to see the None/Unset type string refactor make it back in! I have a few suggestions and Codecov seems to be saying we're missing a unit test somewhere.

Comment on lines +92 to +94
not_required_one_of_models = UNSET
if not isinstance(self.not_required_one_of_models, Unset):
not_required_one_of_models = self.not_required_one_of_models.to_dict()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have to check later why this is the case. I assume there's some difficulty with handling the type in the template which is why we're checking multiple isinstances here.

Copy link
Collaborator Author

@forest-benchling forest-benchling Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the OpenAPI:

          "not_required_one_of_models": {
            "oneOf": [
              {
                "ref": "#components/schemas/FreeFormModel"
              },
              {
                "ref": "#components/schemas/ModelWithUnionProperty"
              }
            ],
            "nullable": false
          },

My guess is that the inner properties are not required? Kind of confusing what the semantics should be...

Comment on lines 173 to 175
if not isinstance(data, str):
raise TypeError()
a_camel_date_time = isoparse(data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think letting isoparse raise the error (if we're going to raise it anyway) would be better because it would give more context as to what is wrong. It's especially confusing to readers to see this since the input type of the function is str, so why would this not be one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, this is here to get the correct Union option more easily? Maybe then we need to exclude it if no Union? Not really necessary for this PR, there are plenty of things like this that need cleanup at some point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, this is just a special case when there's a union of only 1 element. I think it's pretty unlikely to happen in actual usage. Let me try to fix it though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out to be unnecessary if we're changing the input type to be object.

Comment on lines 195 to 197
if not isinstance(data, dict):
raise TypeError()
one_of_models = FreeFormModel.from_dict(data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re: confusion. If we can't be confident in the data passed in, the type annotation should be : Any and we do the runtime check. If we are confident of what's passed in, we can do the correct type annotation and not a runtime check.

I think the only reason this would be the wrong data type is if the OpenAPI spec was wrong or someone made a breaking change to the API. Either way we can't properly handle all the cases that could fail. So I think just the proper type annotation without the runtime check is probably the better approach?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point. Actually initially I didn't have a runtime check, but then MyPy throws an error, because it's not type-safe.

I'm starting to think that maybe the original Any annotation is better, since we can't be confident in the data passed in. But I think it should be object, not Any, because Any causes type errors to be ignored, whereas object will force us to be type-safe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good regarding object vs Any if that ends up working better.

nullable_model = None
_nullable_model = d.pop("nullable_model")
if _nullable_model is not None:
nullable_model = AModelNullableModel.from_dict(cast(Dict[str, Any], _nullable_model))
nullable_model = AModelNullableModel.from_dict(_nullable_model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised mypy doesn't get mad about us passing an Any to a function which takes Dict[str, Any]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's because Any has weird semantics, in that it basically just means "ignore any type errors". If we were to change the signature to object, then it would error.

Comment on lines 172 to 175
def _get_inner_type_strings(self, json: bool = False) -> List[str]:
inner_types = [p.get_type_string(no_optional=True, json=json) for p in self.inner_properties]
unique_inner_types = list(dict.fromkeys(inner_types))
return unique_inner_types
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_inner_type_strings(self, json: bool = False) -> List[str]:
inner_types = [p.get_type_string(no_optional=True, json=json) for p in self.inner_properties]
unique_inner_types = list(dict.fromkeys(inner_types))
return unique_inner_types
def _get_inner_type_strings(self, json: bool = False) -> Set[str]:
return {p.get_type_string(no_optional=True, json=json) for p in self.inner_properties}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I tried this, but IIRC it was causing the codegen to be non-deterministic 😬 (or at least unpredictable)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We could, when actually joining them (I think that's the only place they are used for output?) add a sorted to make it deterministic so it passes e2e tests?

Comment on lines 189 to 199
def get_type_strings_in_union(self, no_optional: bool = False, json: bool = False) -> List[str]:
type_strings = self._get_inner_type_strings(json=json)

if no_optional or (self.required and not self.nullable):
return type_strings
elif self.required and self.nullable:
return ["None"] + type_strings
elif not self.required and self.nullable:
return ["Unset", "None"] + type_strings
else:
return ["Unset"] + type_strings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_type_strings_in_union(self, no_optional: bool = False, json: bool = False) -> List[str]:
type_strings = self._get_inner_type_strings(json=json)
if no_optional or (self.required and not self.nullable):
return type_strings
elif self.required and self.nullable:
return ["None"] + type_strings
elif not self.required and self.nullable:
return ["Unset", "None"] + type_strings
else:
return ["Unset"] + type_strings
def get_type_strings_in_union(self, no_optional: bool = False, json: bool = False) -> Set[str]:
type_strings = self._get_inner_type_strings(json=json)
if no_optional:
return type_strings
if self.nullable:
type_strings.add("None")
if not self.required:
type_strings.add("Unset")
return type_strings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same as above, concern about non-determinism)

{{ property.python_name }} = {{ property.reference.class_name }}(_{{ property.python_name }})
{% endif %}
{% endmacro %}

{% macro check_type_for_construct(property, source) %}{% if property.value_type == str %}isinstance({{ source }}, str){% else %}isinstance({{ source }}, int){% endif %}{% endmacro %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{% macro check_type_for_construct(property, source) %}{% if property.value_type == str %}isinstance({{ source }}, str){% else %}isinstance({{ source }}, int){% endif %}{% endmacro %}
{% macro check_type_for_construct(property, source) %}isinstance({{ source }}, {{ property.value_type }}){% endmacro %}

@@ -11,6 +11,9 @@ class Unset:

UNSET: Unset = Unset()

# Used as `FileProperty._json_type_string`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Used as `FileProperty._json_type_string`
{# Used as `FileProperty._json_type_string` #}

No reason to have that comment make it to generated code I think

@forest-benchling
Copy link
Collaborator Author

if len(inner_types) == 1:
return inner_types[0]
else:
return f"Union[{', '.join(inner_types)}]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f"Union[{', '.join(inner_types)}]"
return f"Union[{', '.join(sorted(inner_types))}]"

I think this will fix the determinism issue? While also maintaining the performance and reduced code of using sets.

@forest-benchling
Copy link
Collaborator Author

@dbanty Sorry, I had to lump #335 into this PR, because of a circular dependency. By changing adding an explicit type check in the parsing code, this PR exposes a type error fixed by #335, while #335 depends on fixing the JSON type as done in this PR.

Copy link
Collaborator

@dbanty dbanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I think the only required change is to cover the rest of the union type functions with unit tests.

Comment on lines +179 to +187
return inner_types.pop()
else:
return f"Union[{', '.join(sorted(inner_types))}]"

def get_base_type_string(self) -> str:
return self._get_type_string_from_inner_type_strings(self._get_inner_type_strings(json=False))

def get_base_json_type_string(self) -> str:
return self._get_type_string_from_inner_type_strings(self._get_inner_type_strings(json=True))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing code coverage on a couple of these lines

sub_prop, schemas = property_from_data(
name=name, required=required, data=sub_prop_data, schemas=schemas, parent_name=parent_name
name=f"{name}_item{i}", required=required, data=sub_prop_data, schemas=schemas, parent_name=parent_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally "item" makes me think of a collection (e.g. list) and I think "type" might be more clear? But that's very subjective so up to you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should probably set required=True here. I think that would solve our extra Unset checks in the generated code. Really either the whole Union is required / nullable or None of it is. Not up to the individual types I don't think.

Would changing this cause any other problems you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable, but is it OK if we punt changing required=True to a future PR, to keep the scope of this PR limited, unless it's necessary here?

@@ -8,6 +8,8 @@
{{ construct_template(construct_function, property, source, initial_value=initial_value) }}
{% endmacro %}

{% macro check_type_for_construct(property, source) %}isinstance({{ source }}, dict){% endmacro %}

{% macro transform(property, source, destination, declare_type=True) %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need a better name for this macro than "transform" at some point 😅even I can't remember what this one is for sometimes. Not a note on this PR, just a note on how I need to put some design thought into the templates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yeah...maybe serialize and deserialize?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or to_json and from_json...

Comment on lines +3 to +10
{% if "None" in property.get_type_strings_in_union(json=True) %}
if data is None:
return data
{% endif %}
{% if "Unset" in property.get_type_strings_in_union(json=True) %}
if isinstance(data, Unset):
return data
{% endif %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use property.required and property.nullable for this? Maybe that's not robust enough, not sure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that depends on the unrolling unions thing you had mentioned below, since a nested property could still be nullable. And technically I guess you could have a none-type in the union.

{% from "property_templates/" + inner_property.template import construct %}
{{ inner_property.python_name }}: {{ inner_property.get_type_string() }}
{% from "property_templates/" + inner_property.template import construct, check_type_for_construct %}
if not {{ check_type_for_construct(inner_property, "data") }}:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could start doing:

if {{ check_type_for_construct(inner_property, "data") }}:
    {{ construct(inner_property, "data", initial_value="UNSET") }}
    return {{ inner_property.python_name }}

and avoid the try/except stuff, yeah?

Copy link
Collaborator Author

@forest-benchling forest-benchling Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would work out-of-the-box, because we're also catching non-type errors, like parsing dates (isoparse("foo")). Agreed that ideally I think it would be more robust for us eventually move when possible away from the try/except (especially with the generic exception catching).

@@ -24,20 +36,25 @@ def _parse_{{ property.python_name }}(data: Any) -> {{ property.get_type_string(
{{ property.python_name }} = _parse_{{ property.python_name }}({{ source }})
{% endmacro %}

{# For now we assume there will be no unions of unions #}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we should probably be unrolling unions while building the properties. I don't imagine that would be too difficult, just a check on each inner_property to see if it's a UnionProperty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, that makes sense.

@forest-benchling forest-benchling requested a review from dbanty March 3, 2021 00:50
@forest-benchling
Copy link
Collaborator Author

Hi @dbanty, sorry for the delay, just got back from vacation. I addressed two of your comments, and the rest I was wondering if we can leave to be follow-ups in order to push this through?

dbanty
dbanty previously approved these changes Mar 13, 2021
Copy link
Collaborator

@dbanty dbanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Sorry this one took so long to get through

@dbanty
Copy link
Collaborator

dbanty commented Mar 13, 2021

Codecov seems to be having some issues reporting final coverage. Can you pull in main to make the checks re-run?

@forest-benchling
Copy link
Collaborator Author

@dbanty Merged in main and added another unit test, looks like checks pass now!

@dbanty dbanty merged commit f9f0963 into openapi-generators:main Mar 15, 2021
@eli-bl eli-bl deleted the forest-union-3 branch November 26, 2024 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants